Overview

  • a network is a collection of vertices joined by edges
  • vertices and edges are also called nodes and links in computer science and actors and ties in sociology
  • in mathematics, a network is called graph and it is typically represented as a square matrix
  • given a graph \(G\) with \(n\) nodes numbered from \(1\) to \(n\), the adjacency matrix \(A = (a_{i,j})\) of \(G\) is a square \(n \times n\) matrix such that \(a_{i,j} = 1\) if there exists an edge joining nodes \(i\) and \(j\), and \(a_{i,j} = 0\) otherwise

Undirected graphs

  • a graph is undirected if edges have no direction: if there is an edge from \(i\) to \(j\), then there is also an edge from \(j\) to \(i\)
  • this means that the adjacency matrix of an undirected graph is symmetric: \(a_{i,j} = a_{j,i}\) for every pair \(i,j\) (or \(A = A^T\), where \(A^T\) is the transpose of \(A\))
  • if there is and edge between \(i\) and \(j\), then \(i\) and \(j\) are said to be neighbors
  • the neighbors of node \(i\) are the \(1\) of row (or column) \(i\) of \(A\)

Undirected graphs

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    0    1    0    0    1    0
## [2,]    1    0    1    1    0    0
## [3,]    0    1    0    1    1    1
## [4,]    0    1    1    0    0    0
## [5,]    1    0    1    0    0    0
## [6,]    0    0    1    0    0    0

Directed graphs

  • a graph is directed if edges have a direction: if there is an edge from \(i\) to \(j\), then there might be or not the inverse edge
  • this means that the adjacency matrix of an directed graph is not necessarily symmetric
  • if there is and edge from \(i\) to \(j\), then \(i\) is a predecessor of \(j\) and \(j\) is a successor of \(i\)
  • the predecessors of node \(i\) are the \(1\) on column \(i\) of \(A\) and the successors of node \(i\) are the \(1\) on row \(i\) of \(A\)
  • self-loops or self-edges are edges \((i,i)\) from a node \(i\) to itself: they correspond to diagonal entries in the adjacency matrix

Directed graphs

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    0    0    1    1    0    0
## [2,]    0    0    0    0    1    0
## [3,]    0    1    0    0    0    0
## [4,]    0    0    0    1    1    0
## [5,]    0    0    1    1    0    1
## [6,]    0    0    0    0    0    0

Unweighted and weighted graphs

In some situations it is useful to represent edges as having a strength, weight, or value to them:

  1. in the Internet edges might have weights representing the amount of data flowing along them or their bandwidth
  2. in a food web predator-prey interactions might have weights measuring total energy flow between prey and predator
  3. in a social network connections might have weights representing the sign and intensity of the relationship: positive weights denote friendship and negative ones represent animosity
  • Any SCI related examples?

Unweighted and weighted graphs

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    0    1    0    0    5    0
## [2,]    1    0    2    8    0    0
## [3,]    0    2    0    4    5    6
## [4,]    0    8    4    0    0    0
## [5,]    5    0    5    0    0    0
## [6,]    0    0    6    0    0    0

Bipartite graphs

  • a bipartite graph is a graph where there are two kinds of vertices, and edges run from nodes of different types only
  • any network in which the vertices are connected together by common membership of groups of some kind can be represented in this way
  • in sociology such networks are called affiliation networks: for instance, scientists coauthoring papers or film actors appearing together in films
  • a bipartite network is typically represented with an incidence matrix: if \(n\) is the number of actors in the network and \(g\) is the number of groups, then the incidence matrix \(B\) is a \(g \times n\) matrix having elements \(B_{i,j} = 1\) if group \(i\) contains participant \(j\) and \(B_{i,j} = 0\) otherwise

Bipartite graphs

##    1 2 3 4 5 6 7
## 8  1 1 1 0 0 0 0
## 9  0 1 1 1 1 0 0
## 10 0 0 0 1 0 1 0
## 11 0 0 0 0 1 1 1

Tidy network data?

  • there is a discrepancy between network data and the tidy data idea, in that network data cannot in any meaningful way be encoded as a single tidy data frame
  • on the other hand, both node and edge data by itself fits very well within the tidy concept as each node and edge is, in a sense, a single observation
  • thus, a close approximation of tidyness for network data is two tidy data frames, one describing the node data and one describing the edge data

tidygraph

  • tidygraph is an entry into the tidyverse that provides a tidy framework for network (graph) data
  • tidygraph provides an approach to manipulate node and edge data frames using the interface defined in the dplyr package
  • moreover it provides tidy interfaces to a lot of common graph algorithms, including igraph network analysis toolkit
  • it is developed by Thomas Lin Pedersen

ggraph

  • ggraph is an extension of ggplot2 that implements a visualization grammar for network data
  • it provides a huge variety of geoms for drawing nodes and edges, along with an assortment of layouts making it possible to produce a very wide range of network visualization types
  • while tidygraph provides a manipulation and analysis grammar for network data (like dplyr for tabular data), ggraph offers a visualization grammar (like ggplot for tabular data)
  • it is developed by Thomas Lin Pedersen

A full example: dplyr, tidygraph and ggraph

library(dplyr)
library(ggraph)
library(tidygraph)

# setting theme_graph 
set_graph_style()

# a graph of highschool friendships
head(highschool)
##   from to year
## 1    1 14 1957
## 2    1 15 1957
## 3    1 21 1957
## 4    1 54 1957
## 5    1 55 1957
## 6    2 21 1957
# create the graph and add popularity
graph <- as_tbl_graph(highschool) %>% 
    mutate(Popularity = centrality_degree(mode = "in"))

# plot using ggraph
ggraph(graph, layout = "kk") + 
    geom_edge_link() + 
    geom_node_point(aes(size = Popularity)) + 
    facet_edges(~year) + 
    theme_graph(foreground = "steelblue", fg_text_colour = "white")

Read the graph with tidygraph

Let’s read a dolphin network:

  1. a set of nodes representing dolphins (dolphin_nodes.csv)
  2. a set of edges representing ties among dolphins (dolphin_edges.csv)
library(readr)

nodes <-  read_csv("data/dolphin_nodes.csv")
edges <-  read_csv("data/dolphin_edges.csv")

nodes
## # A tibble: 62 x 2
##    name       sex  
##    <chr>      <chr>
##  1 Beak       M    
##  2 Beescratch M    
##  3 Bumper     M    
##  4 CCL        F    
##  5 Cross      M    
##  6 DN16       F    
##  7 DN21       M    
##  8 DN63       M    
##  9 Double     F    
## 10 Feather    M    
## # … with 52 more rows
edges
## # A tibble: 159 x 2
##        x     y
##    <dbl> <dbl>
##  1     4     9
##  2     6    10
##  3     7    10
##  4     1    11
##  5     3    11
##  6     6    14
##  7     7    14
##  8    10    14
##  9     1    15
## 10     4    15
## # … with 149 more rows

About the Data

David Lusseau, a researcher at the University of Aberdeen, observed the group of dolphins of Doubtful Sound. Every time a school of dolphins was encountered in the fjord between 1995 and 2001, each adult member of the school was photographed and identified from natural markings on the dorsal fin. This information was utilised to determine how often two individuals were seen together. Read the full story.

Tidygraph data structure

Package tidygraph represents the graph as a pair of data frames:

  • a data frame for nodes containing information about the nodes in the graph
  • a data frame for edges containing information about the edges in the graph. The terminal nodes of each edge must either be encoded in a to and from column, or in the two first columns, as integers. These integers refer to nodes index.

# add edge type
edges <-  edges %>% 
  mutate(type = sample(c("love", "friendship"), 
                       nrow(edges), 
                       replace = TRUE) )

# make a tidy graph
dolphin <-  tbl_graph(nodes = nodes, edges = edges, directed = FALSE)
dolphin
## # A tbl_graph: 62 nodes and 159 edges
## #
## # An undirected simple graph with 1 component
## #
## # Node Data: 62 x 2 (active)
##   name       sex  
##   <chr>      <chr>
## 1 Beak       M    
## 2 Beescratch M    
## 3 Bumper     M    
## 4 CCL        F    
## 5 Cross      M    
## 6 DN16       F    
## # … with 56 more rows
## #
## # Edge Data: 159 x 3
##    from    to type      
##   <int> <int> <chr>     
## 1     4     9 friendship
## 2     6    10 love      
## 3     7    10 love      
## # … with 156 more rows

How to pull back nodes and edges information?

Pipe friendly version

# extract node and edge data frames from the graph
dolphin %>% 
  as.list()

# extract node data frame from the graph

dolphin %>% 
  activate(nodes) %>% 
  as_tibble()

# extract edge data frame from the graph
dolphin %>% 
  activate(edges) %>% 
  as_tibble()

Tidygraph visualisation with ggraph

Ggraph builds upon three core concepts that are quite easy to understand:

  • the layout defines how nodes are placed on the plot. ggraph has access to all layout functions available in igraph and much more
  • the nodes are the connected entities in the graph structure. These can be plotted using the geom_node_*() family of geoms
  • the edges are the connections between the entities in the graph structure. These can be visualized using the geom_edge_*() family of geoms

ggraph basics

# setting theme_graph 
set_graph_style()

# basic node plot
ggraph(dolphin) + 
  geom_node_point()

# basic graph plot
ggraph(dolphin) + 
  geom_edge_link() + 
  geom_node_point()

# plot edge type
ggraph(dolphin) + 
  geom_edge_link(aes(color = type)) + 
  geom_node_point()

# plot node sex
ggraph(dolphin) + 
  geom_edge_link(aes(color = type)) + 
  geom_node_point(aes(shape = sex))

# plot node name
ggraph(dolphin) + 
  geom_edge_link() + 
  geom_node_point() + 
  geom_node_text(aes(label = name), repel=TRUE)

Faceting

Faceting allows to create sub-plots according to the values of a qualitative attribute on nodes or edges.

# setting theme_graph 
set_graph_style()


# facet edges by type
ggraph(dolphin) + 
  geom_edge_link(aes(color = type)) + 
  geom_node_point() +
  facet_edges(~type)

# facet both nodes and edges
ggraph(dolphin) + 
  geom_edge_link() + 
  geom_node_point() +
  facet_graph(type~sex) + 
  th_foreground(border = TRUE)

Directed graphs

# setting theme_graph 
set_graph_style()

# direcred graphs
package <-  tibble(
  name = c("igraph", "ggraph", "dplyr", "ggplot", "tidygraph")
)

tie <-  tibble(
  from = c("igraph", "ggplot", "igraph", "dplyr", "ggraph"),
  to =   c("tidygraph", "ggraph", "tidygraph", "tidygraph", "tidygraph")
)

tidy <-  tbl_graph(nodes = package, edges = tie, directed = TRUE)


# use arrows for directions
ggraph(tidy, layout = "graphopt") + 
    geom_edge_link(aes(start_cap = label_rect(node1.name), 
                       end_cap = label_rect(node2.name)), 
                   arrow = arrow(type = "closed", 
                                 length = unit(3, "mm"))) + 
    geom_node_text(aes(label = name))

# use edge alpha to indicate direction, 
# direction is from lighter to darker node
ggraph(tidy, layout = 'graphopt') + 
    geom_edge_link(aes(start_cap = label_rect(node1.name), 
                       end_cap = label_rect(node2.name), 
                       alpha = stat(index)), 
                   show.legend = FALSE) + 
    geom_node_text(aes(label = name))

Hierarchical layouts

# setting theme_graph 
set_graph_style()

# This dataset contains the graph that describes the class 
# hierarchy for the Flare visualization library.
head(flare$vertices)
##                                           name size             shortName
## 1 flare.analytics.cluster.AgglomerativeCluster 3938  AgglomerativeCluster
## 2   flare.analytics.cluster.CommunityStructure 3812    CommunityStructure
## 3  flare.analytics.cluster.HierarchicalCluster 6714   HierarchicalCluster
## 4            flare.analytics.cluster.MergeEdge  743             MergeEdge
## 5  flare.analytics.graph.BetweennessCentrality 3534 BetweennessCentrality
## 6           flare.analytics.graph.LinkDistance 5731          LinkDistance
head(flare$edges)
##                      from                                           to
## 1 flare.analytics.cluster flare.analytics.cluster.AgglomerativeCluster
## 2 flare.analytics.cluster   flare.analytics.cluster.CommunityStructure
## 3 flare.analytics.cluster  flare.analytics.cluster.HierarchicalCluster
## 4 flare.analytics.cluster            flare.analytics.cluster.MergeEdge
## 5   flare.analytics.graph  flare.analytics.graph.BetweennessCentrality
## 6   flare.analytics.graph           flare.analytics.graph.LinkDistance
# flare class hierarchy
graph = tbl_graph(edges = flare$edges, nodes = flare$vertices)

# dendrogram
ggraph(graph, layout = "dendrogram") + 
  geom_edge_diagonal()

# circular dendrogram
ggraph(graph, layout = "dendrogram", circular = TRUE) + 
  geom_edge_diagonal() + 
  geom_node_point(aes(filter = leaf)) + 
  coord_fixed()

# rectangular tree map
ggraph(graph, layout = "treemap", weight = size) + 
  geom_node_tile(aes(fill = depth), size = 0.25)

# circular tree map
ggraph(graph, layout = "circlepack", weight = size) + 
  geom_node_circle(aes(fill = depth), size = 0.25, n = 50) + 
  coord_fixed()

# icicle
ggraph(graph, layout = "partition") + 
  geom_node_tile(aes(y = -y, fill = depth))

# sunburst (circular icicle)
ggraph(graph, layout = "partition", circular = TRUE) +
  geom_node_arc_bar(aes(fill = depth)) +
  coord_fixed()

Network analysis with tidygraph

  • the data frame graph representation can be easily augmented with metrics computed on the graph
  • before computing a metric on nodes or edges use the activate() function to activate either node or edge data frames
  • use dplyr verbs filter, arrange and mutate to manipulate the graph

Network analysis with tidygraph

dolphin <-  dolphin %>% 
  activate(nodes) %>% 
  mutate(degree = centrality_degree()) %>% 
  filter(degree > 0) %>% 
  arrange(-degree) %>% 
  activate(edges) %>% 
  mutate(betweenness = centrality_edge_betweenness()) %>% 
  arrange(-betweenness)

dolphin
## # A tbl_graph: 62 nodes and 159 edges
## #
## # An undirected simple graph with 1 component
## #
## # Edge Data: 159 x 4 (active)
##    from    to type       betweenness
##   <int> <int> <chr>            <dbl>
## 1    10    17 friendship        283.
## 2    13    29 friendship        219.
## 3     6    10 love              184.
## 4     2    17 friendship        181.
## 5    17    48 friendship        173.
## 6     9    48 love              146.
## # … with 153 more rows
## #
## # Node Data: 62 x 3
##   name    sex   degree
##   <chr>   <chr>  <dbl>
## 1 Grin    F         12
## 2 SN4     F         11
## 3 Topless M         11
## # … with 59 more rows

Analyse and visualize network: centrality

Packages tidygraph and ggraph can be pipelined to perform analysis and visualization tasks in one go.

# setting theme_graph 
set_graph_style()


dolphin %>% 
  activate(nodes) %>%
  mutate(pagerank = centrality_pagerank()) %>%
  activate(edges) %>%
  mutate(betweenness = centrality_edge_betweenness()) %>%
  ggraph() +
  geom_edge_link(aes(alpha = betweenness)) +
  geom_node_point(aes(size = pagerank, colour = pagerank)) + 
  # discrete colour legend
  scale_color_gradient(guide = "legend")

# or even less typing
ggraph(dolphin) +
  geom_edge_link(aes(alpha = centrality_edge_betweenness())) +
  geom_node_point(aes(size = centrality_pagerank(), 
                      colour = centrality_pagerank())) + 
  scale_color_gradient(guide = "legend")

Analyse and visualize network: communities

# setting theme_graph 
set_graph_style()


# visualize communities of nodes
dolphin %>% 
  activate(nodes) %>%
  mutate(community = as.factor(group_louvain())) %>% 
  ggraph() + 
  geom_edge_link() + 
  geom_node_point(aes(colour = community), size = 5)

Analyse and visualize network: graph metrics

Compute degree, closeness, betweenness and PageRank on the network. Are top-ranked dolphins male or female?

dolphin <- dolphin %>% 
  activate(nodes) %>% 
  mutate(degree = centrality_degree(), 
         pagerank = centrality_pagerank(), 
         closeness = centrality_closeness(),
         betweenness = centrality_betweenness())

actors <-  as.list(dolphin)$nodes

actors %>% 
  group_by(sex) %>% 
  summarise(avg_degree = mean(degree),
            avg_pagerank = mean(pagerank),
            avg_closeness = mean(closeness),
            avg_betweenness = mean(betweenness))
## # A tibble: 3 x 5
##   sex   avg_degree avg_pagerank avg_closeness avg_betweenness
##   <chr>      <dbl>        <dbl>         <dbl>           <dbl>
## 1 F           5.6        0.0169       0.00529            86.2
## 2 M           5.06       0.0162       0.00493            66.8
## 3 U           2.75       0.0104       0.00431            24.8

Analyse and visualize network: graph metrics

Study the correlation among the four centrality measures.

M <-  cbind(degree = actors$degree,
          pagerank = actors$pagerank,
          closeness = actors$closeness,
          betweenness = actors$betweenness)


(corM <-  cor(M))
##                degree  pagerank closeness betweenness
## degree      1.0000000 0.9830948 0.7126718   0.5902140
## pagerank    0.9830948 1.0000000 0.6217912   0.6021920
## closeness   0.7126718 0.6217912 1.0000000   0.6657346
## betweenness 0.5902140 0.6021920 0.6657346   1.0000000
corrplot(corM, method = "ellipse")

Thanks for your attetion!

Filippo Chiarello (filippo.chiarello@unipi.it)[filippo.chiarello@unipi.it]